Skip to content

Phase 6.2: async outbound connect — eliminate 3s vCPU stall#74

Closed
dpsoft wants to merge 11 commits intosmoltcp-passt-port-phase6.1-half-closefrom
smoltcp-passt-port-phase6.2-async-connect
Closed

Phase 6.2: async outbound connect — eliminate 3s vCPU stall#74
dpsoft wants to merge 11 commits intosmoltcp-passt-port-phase6.1-half-closefrom
smoltcp-passt-port-phase6.2-async-connect

Conversation

@dpsoft
Copy link
Copy Markdown
Contributor

@dpsoft dpsoft commented May 5, 2026

Status: DRAFT. Stacked on PR #73 (Phase 6.1 half-close).

What this branch does

Replaces the synchronous `TcpStream::connect_timeout(addr, 3s)` on the vCPU thread with a non-blocking connect + EPOLLOUT-driven completion on the net-poll thread. The vCPU thread is never blocked on connect again.

Severity: Medium-High — today, a guest opening a connection to ONE unreachable destination freezes ALL guest networking for up to 3 seconds (the connect_timeout). DNS misconfigurations, transient NAT failures, or one slow destination among many freeze the whole pipeline.

Headline win

Workload Before After
vCPU thread blocked on `connect_timeout` up to 3 s <100 µs
Other flows during a stuck connect also blocked unaffected

The new BROKEN_ON_PURPOSE pin `tcp_connect_to_unreachable_does_not_block_other_flows` flips at Task 5 (`91947a3`) when EPOLLOUT-driven completion lands.

Architecture

  • New `TcpNatState::Connecting` state.
  • Guest SYN → `socket2::Socket::new(IPV4, STREAM.nonblocking, TCP)` → `connect()` returns EINPROGRESS → insert flow with state=Connecting, register FD with `RegisterMode::Write` → return immediately to vCPU.
  • Net-poll thread sees EPOLLOUT readiness → `relay_pending_connects` checks `getsockopt(SO_ERROR)`: zero → transition to SynReceived, send SYN-ACK to guest, modify epoll Write→Read; non-zero → send RST to guest, reap.
  • `CONNECT_TIMEOUT` (3 s) reaping for stuck `Connecting` flows (silent firewall drop) — uses Phase 6.1's `last_state_change` field.
  • New `EpollDispatch::modify` (`EPOLL_CTL_MOD`) flips Write→Read on connect completion.

Bench evidence

`scripts/bench-compare.sh --baseline 47868f0 --skip-vm`:

Bench Baseline HEAD Note
`process_syn_during_pending_connects/0` 12.8 µs new bench
`process_syn_during_pending_connects/10` 12.6 µs flat
`process_syn_during_pending_connects/100` 656 ns O(1) — cost doesn't scale
`process_syn_during_pending_connects/1000` 1.39 µs with backlog size
`port_forward_accept_latency` 50.1 ms 183 µs inherited from #72
`poll_with_n_mixed_flows/999` 304 µs 10.1 µs -96.7 % held
`tcp_bulk_throughput_1mb` 58.8 ms 58 ms parity

Wall-clock vs master

Metric Master This branch Δ
TCP g2h throughput 1885 Mbps 5630 Mbps +199 % (3.0×)
TCP bulk-g2h 1565 Mbps 4940 Mbps +216 % (3.2×)
TCP CRR p50 ~10 ms ~10.1 ms parity
TCP RR p50 2 µs 2 µs parity

What changed (10 commits)

  1. `1cbfde1` — chore: socket2 dep
  2. `b460b0b` — feat: TcpNatState::Connecting + guest_isn field
  3. `e2d54df` — test: BROKEN_ON_PURPOSE pin (flips at refactor(pipeline): unify execution loop and document semantics #5)
  4. `719d424` — feat: non-blocking connect on guest SYN
  5. `91947a3` — feat: EPOLLOUT-driven completion (relay_pending_connects)
  6. `13ed906` — test: pin tcp_connect_async_eventual_rst_on_failure
  7. `ecc00c0` — feat: CONNECT_TIMEOUT reaping for stuck Connecting flows
  8. `b71e2e9` — bench: process_syn_during_pending_connects parametric
  9. `95faac3` — chore: validation gate
  10. `21264b6` — fix(bench): drop unused Ipv4Address import

Validation

Suite Status
`cargo fmt --all -- --check`
`cargo clippy --workspace --all-targets --all-features -- -D warnings`
`cargo test --test network_baseline` ✅ 22/22
`cargo test --test network_baseline --features bench-helpers -- --test-threads=1` ✅ 24/24
`cargo test --lib network` ✅ 23/23
`cargo bench --bench network --features bench-helpers --no-run`
`cargo build --release`
`voidbox-network-bench --iterations 3 --bulk-mb 10` ✅ no regression

Open follow-ups

  • Phase 6.3 (TCP window management) — separate PR.
  • A snapshot integration test for `Connecting` state reaping — current behavior is "skip + reap" in `rebuild_epoll_from_flow_table`; documented but not yet pin-tested.

dpsoft added 11 commits May 4, 2026 16:28
9 bite-sized tasks covering the TcpStream::connect_timeout(3s)
removal from the vCPU TX path:

- New TcpNatState::Connecting state.
- Non-blocking socket via socket2 + EINPROGRESS handling.
- EPOLLOUT-driven completion in relay_pending_connects, called
  from drain_to_guest before relay_tcp_nat_data.
- getsockopt(SO_ERROR) checks the actual connect outcome on
  EPOLLOUT readiness.
- EpollDispatch::modify (EPOLL_CTL_MOD) flips Write→Read on
  successful connect.
- CONNECT_TIMEOUT (3s) reaping for stuck Connecting flows
  (silent firewall drop).
- Two new pins: connect-to-unreachable-doesn't-block-others
  (BROKEN_ON_PURPOSE → flips at Task 5) + async-RST-on-failure.
- One new bench: process_syn_during_pending_connects parametric
  on N pending connecting flows (O(1) regression gate).

Severity: MEDIUM-HIGH. Today TcpStream::connect_timeout(addr, 3s)
on the vCPU thread freezes ALL guest networking for up to 3s
when one destination is slow/unreachable.
…ndshakes

Replace the synchronous TcpStream::connect_timeout(3s) on the vCPU thread
with a non-blocking socket2 connect that returns EINPROGRESS immediately.
Flows are inserted with TcpNatState::Connecting and their fd registered for
EPOLLOUT. EPOLLOUT-driven completion (Task 5: relay_pending_connects) will
promote them to SynReceived and send SYN-ACK.  An unreachable destination
no longer blocks all other guest networking for 3 seconds.
…connects)

Add EpollDispatch::modify (EPOLL_CTL_MOD) to atomically switch a registered
fd's event interest from Write to Read without a DEL+ADD window. Add
relay_pending_connects, called from drain_to_guest before relay_tcp_nat_data,
which drives all pending Connecting flows: checks SO_ERROR, sends SYN-ACK and
transitions to SynReceived on success, or RST and Closed on failure. Update
rebuild_epoll_from_flow_table to reap Connecting entries post-snapshot (the
underlying socket fd is dead after restore). The BROKEN_ON_PURPOSE pin
tcp_connect_to_unreachable_does_not_block_other_flows now passes.
Verifies that connecting to a recently-dropped listener port eventually
delivers a RST to the guest via relay_pending_connects's SO_ERROR path.
Already passes after Task 5 lands; pinned now to guard the behavior.
Add Connecting-timeout detection to relay_tcp_nat_data's timeout sweep.
Flows stuck in Connecting for longer than CONNECT_TIMEOUT (3 s — matching
the pre-Phase-6.2 synchronous connect_timeout behavior) are reaped: a RST
is sent to the guest and the flow table entry is removed. This handles the
silent-firewall-drop case where EPOLLOUT never fires.
Add insert_synthetic_connecting_entry bench-helper to SlirpBackend and
add the process_syn_during_pending_connects parametric bench (args: 0, 10,
100, 1000 pending connects). Validates that the SYN-handler cost is O(1)
in pending-connect backlog size — only flow_table.insert + epoll.register,
both O(1).
fmt:    cargo fmt --all -- --check                                PASS
clippy: cargo clippy --workspace --all-targets --all-features    PASS
tests:  cargo test -p void-box --all-features                    PASS (353 unit tests)
        cargo test --test network_baseline                       PASS (22 tests)
        cargo test --test network_baseline --features bench-helpers
                                                                 PASS (24 tests)
        cargo test --lib network                                 PASS (25 network tests)
bench:  cargo bench --bench network --features bench-helpers --no-run
                                                                 PASS (compiles)
build:  cargo build --release                                    PASS

Wall-clock (voidbox-network-bench --iterations 3 --bulk-mb 10):
  g2h:  ~5.7 Gbps (hardware-limited; no regression from Phase 6.1)
  rr:   p50=2 µs
  crr:  p50=10.1 ms

BROKEN_ON_PURPOSE pin (tcp_connect_to_unreachable_does_not_block_other_flows)
flipped from FAIL to PASS at Task 5 (relay_pending_connects landed).

VM suite (conformance, oci_integration, snapshot_integration) skipped:
VOID_BOX_INITRAMFS not available in this environment.
The import was only consumed at the bench-helpers-gated
process_syn_during_pending_connects bench (Task 8). Default-feature
clippy --bench network failed with -D warnings because the import
is unused when bench-helpers is off.

Quickest fix: qualify the single bare-name use as
smoltcp::wire::Ipv4Address (matches the other call sites in the
file) and drop Ipv4Address from the use list.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant